Jun 23 04 10: 45a 



919-544-5920 



P- 7 



Application No. 09/677,120 

Amendment dated June 23, 2004 

Reply to Office Action dated March 25, 2004 

Express Mail EV406652085US 

Amendments to the Specification : 

Please replace paragraph 6 at page 5, lines 26 through page 6, line 4 with the following 
amended paragraph: 

Referring to Fig. 1, an overview of a multi-processing data processing system iSQ is 
depicted. For clarity and ease of presentation, an eight-processor system has been depicted. As 
will be readily appreciated by those of ordinary skill in the art, the invention is applicable to 
multi-processor systems having other numbers of processors. It is also not necessary that each 
processor group have four members, or that all processor groups have the same number of 
processors. 

Please replace paragraph 2 at page 7, lines 5-17 with the following amended paragraph: 

Looking again at Fig. I, the various memory components within system 4-50 can be 
conceptualized as comprising three processing levels. Each level can be considered to contain 
one or more "instances" or nodes in the CPU/cache/shared memory hierarchy. Level 0 contains 
eight level 0 instances, or in the case of four locales or blocks, sixteen level 0 instances, each 
instance being one of the CPUs 100-107 and its associated caches 108. Level 1 contains two 
level 1 instances, each comprising four level 0 instances, and a secondary cache. While eight 
processors are shown, the system can be arranged with more processors, for example, as a 
sixteen processor system. In the case of a sixteen processor system, level 1 contains four level 1 
instances, each confirming four level 0 instances and a secondary cache. Finally, there is a 
single level 2 instance containing the two level 1, or as applicable, the four level 1, instances and 
the shared system memory. 
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Please replace paragraph 4 at page 7, lines 27 through page 8, line 14 with the following 
amended paragraph: 

More specifically, a job will take the least amount of time if it is always run on the same 
processor. This is because of the multiple levels of memory caches used to reduce memory 
latency. In general, the lower the cache level, the less time it takes to access the memory. When 
a job is run on a processor, its working set of memory is loaded into the applicable cache 108 of 
one of processors 100-107. On a NUMA system, there is also another level of cache 110 or 1 1 1 
that is shared between the four processors 100-103 or 104-107 of a functional block. In the best 
case scenario, the job is run on the processor which has the required code and data in its cache. 
The next best case scenario is to run the job on another processor on the same NUMA block. 
The tradeoff in this circumstance is between waiting for the processor on which the job last ran 
becoming free, or running the job on another idle processor. If time is allowed for the last 
processor, the idle processor's cycles will be wasted If the job is run on a different processor, 
time and bus bandwidth may be wasted filling the cache of the idle processor and pulling all of 
the modified memory from the cache of the original processor. As noted previously, although 
the system and method are being described with reference to two blocks or locales 0 and 1 , it 
will be appreciated that it can be implemented with, for example, sixteen CPUs, namely CPUs 0- 
3 in a locale 0, CPUs 4-7 in a locale 1, CPUs 8-1 1 in a locale 2, and CPUs 12-15 in a locale 3. 

Please replace paragraph 2 at page 8, lines 15-21 with the following amended 
paragraph: 

Turning now to the specific disclosure in the drawing, each processor 100-107 keeps 
track of the time from when it first went busy. It is intended that a process is poached when the 
processor on which it is scheduled is too busy to run the process. Accordingly, once a 
predetermined amount of time has elapsed during which a processor continues to be busy, other 
processors who have gone idle are then free to pouc h poach the process from the busy processor. 
Each processor 100-107 has a timer for this purpose. 
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